Adapt Project

Prerequistites

We have used R cran version 3.4.3 to write the scripts in the repository. It is necessary to install some packages before executing the scripts. They are detailed in chek_packages.r. For example, to install a package from R, we do:

install.packages(“RJSONIO”,repos = “http://cran.us.r-project.org”) install.packages(“jsonlite”,repos = “http://cran.us.r-project.org”) …

It is preferable to install RStudio to manage the project and execute the scripts. Although, the use of shell scripts is also possible as detailed below.

Running AR-RULES MINING:

Data Generation and Preparation:

It is the first step of the workflow. It starts by interrogating the Neo4J database using a json specification file, and generates the formal context files with two formats: csv, and rcf. This step can be done, either by calling get_query.r from RStudio, or using the bash script extractProcessEvent.sh. The needed information are: an input json specification file that conains the Gremlin query, an output csv file, and an output rcf file.

Example:

$> ./extractProcessEvent.sh It will create the ProcessEvent context files: ‘./contexts/ProcessEventSample.csv’ and ‘./contexts/PrecessEventSample.rcf’.

We can do the same process under un R environment:

R prompt> source(“get_auery.r”)

R prompt> get_query(“JsonSpecFile”“,”rcf_context_file“,”csv_file“”)

Exxample of csv context

Exxample of csv context

In this first step, many other sub-modules are also available:

process_rcf_context.r: this function takes as input an RCF context and generates a csv context file.

Example:

R prompt> source(“process_rcf_context.r”)

*R prompt> process_rcf_context=function(c“rcf_context_file”,“csv_file”“)*

csv_to_rcf.r: this function takes as input a csv context file and generates an RCF context.
load_rcf.r: this function takes as input an RCF context and loads a data structure in memory.

load_csv.r: this function takes as input a csv context and loads a data structure in memory.

check_packages.r: checks the needed packages.

We have also implemented the check_csv_contexts.sh script, which does a comparison between a reference csv file (context) and a new procuced csv file. It allows us the early detection of any abnormal format of the input data.

Example:

$> ./check_csv_contexts.sh

Data Processing:

\(> ./Context_scoring_From_CSV.sh*: is actually the second main shell that can be executed after *\)> ./extractProcessEvent.sh, since it takes a context_name string, a produced csv context file, an rcf file, an output_scoring_file and numerical values for MinSup and MinConf.

Exxample of scoring output

Exxample of scoring output

Exxample of ranking the scores

Exxample of ranking the scores

### Data Visualisation:

Rule_coloring.r (unedr progress): generates an HTML file for coloring the rules, and a word cloud of the objects that violate rules, regarding their score.

Exxample of object scores cloud

Exxample of object scores cloud